1 Introduction

What explains variation in COVID-19 vaccination uptake across U.S. counties? By mid-2021, only 60% of Americans have been partially vaccinated against COVID-19. Vaccination uptake varies widely between counties (percentiles 10th=32%; 25th=37%; 50th=44%; 75th=52%; 90th=60%; entropy=3.81 bits), and within and between states (Figure 1). Currently, there is a surplus of over 60 million vaccine doses that have been delivered but not yet administered (CDC 2020b). As of this writing, there have been at least 600 thousand COVID-19 deaths, cases are increasing again nationally, and forecast models expect thousands deaths in the coming weeks (CDC 2020a). Every percent increase in vaccine uptake has the potential to prevent thousands of deaths, tens of thousands of hospitalizations, and hundreds of thousands of infections (Bartsch et al. 2021). Understanding the mismatch between available vaccine doses and unvaccinated communities is therefore an immediate health and welfare priority.

A caption

A caption

This article investigates possible data generating processes behind variation in U.S. vaccine uptake across counties. Our intended contributions are as follows: (1) A standardized benchmark for COVID-19 uptake that can be used to compare the performance of ours and other models; (2) A standardized cross-validation strategy for evaluating performance out of sample; (3) A synthesis of the existing literature on vaccine uptake into plausible data generating processes; (4) A high performing model that explains the majority of variation in vaccine uptake in the U.S.; and (5) Extensive ablation analysis that precisely highlights the importance of subsets of features.

The article is organized into the following sectoins. The first section provides a brief literature review of comparable vaccine uptake projects for COVID-19 in the US. The next section defines the outcome, unit of analysis, and domain. We then describe out measurement strategy for vaccine uptake, and propose a train, validation, and test strategy.

2 Current State of the Art

We organize the research on vaccine uptake along the following lines. In this section we summarize only the most proximate literature on explaining COVID-19 vaccine uptake across U.S. geographic units, listed in Table 1. In the next section discussing our outcome, we review relevant measurement projects attempting to record uptake accurately. Finally in the section theorizing possible data generating processes for vaccine uptake we summarize the extensive literature on vaccine hesitancy, vaccine supply, previous vaccnie campaigns, etc.

Cite Unit Hold Out Source Reported Performance Features Functional Form
County-Week 1,2,4,8 week rolling future CovidActNow MAE 3.82%, 5.98%, 9.45%, 16.05% education, race, age, covid, population Bayesian Autoregressive Spatial Beta Linear Model
Mishra et al. (2021) County None CDC R^2=0.17 historic under-vaccination, sociodemographic barriers, resource constrained healthcare, healthcare accessibility, irregular care-seeking Multi-level linear model
Stewart et al. (2021) County None Covidcast None COVID-19 Community Vulnerability Index Multilevel Linear Model (Population Weighted)
Pathak, Menard, and Garcia (2021) NA NA NA NA NA NA

The state of the art for explaining COVID-19 vaccine uptake across geographic units in the U.S. is poorly defined and incomplete. There is no accepted standard benchmark for comparing performance of competing explanations, each study picks their own subset of untis, time cut off, and data source on an ad hoc basis. With the exception of a very small literature attempting to forecast vaccine rates, this work is largely exploratory, focusing on whether a simple model places non-zero weight on a feature and reporting only in sample performance or ignoring performance all together.

Forecasting work projects known uptake rates some number of weeks into the future (Chernyavskiy, Richardson, and Ratcliffe 2021). They focus on a different question, what explains change in vaccine uptake over time, and so predict future values conditional not just on features but also past known values, whereas we seek to understand what previous features prior to the start why the entire vaccine roll out was more successful in some parts of the country than others.

(Mishra et al. 2021), predicts county level vaccine uptake using CDC data, with a multi-level linear model, fit to a hand engineered ranking indexes of 28 county measures organized into 5 themes (historic under-vaccination, sociodemographic barriers, resource-constrained healthcare system, healthcare accessibility barriers, and irregular care-seeking behavior). Their best performing model has an marginal \(R^2\) of only 0.17, which while not directly comparable does illustrate the low starting baseline for accounting for uptake variation in the existing literature. (Stewart et al. 2021) fit multilevel models of uptake measured by COVIDcast weighted and find nonzero weights placed on COVID-19 Community Vulnerability Index but do not report performance.

The next most direct work is descriptive

3 Vaccine Uptake

3.1 Outcome, Performance Metric, Unit of Analysis, and Domain

Our outcome (\(VaccineUptake\)) is the number of persons at least partially vaccinated in each county (\(Vaccinated\)) divided by the over 18 population measured by the recent 2020 census (\(Pop18+\)). We choose this specific outcome instead of nearby alternatives such as fully vaccinated, or percent of eligible population, for several reasons. First, our substantive practical interest is in why some Americans who are not currently vaccinated might become vaccinated in the near future. Those that receive one dose are likely to receive the second, those that don’t are at least have partially immunity, and for measurement purposes some vaccines such as Johnson & Johnson only require one dose and constitute fully vaccinated. Second, eligibility criteria is endogenous to uptake across groups first in line and in any case nearly uniform across states now. We choose the denominator of 18 plus population because it is much more accurately measured in the recent 2020 census release than 12 plus population based on extrapolations from the 2010 census and because vaccination for children 12 to 18 is still much more uncommon in the U.S.

We choose out of sample root mean squared error (RMSE) as our performance metric \(L(\hat{VaccineUptake_{oob}}, VaccineUptake_{oob})\).

We limit our domain to the continental United States, excluding Alaska, Hawaii, and island territories because of their unique logistical constraints in vaccine distribution. Our unit of analysis is the U.S. county at a single time point, July 1, 2021. Counties are the smallest dissagregation available for the entire U.S. as zip-code level data is available for a few states, and cross-unit measurement error encountered already at the county level make us dubious about zooming in with current data.

3.2 Measurement Strategy

Our measurement strategy is based on official statistics compiled by county and state health departments, further aggregated into a national panel by third parties. We consider four national panels compiled by the CDC (COVID-19 Vaccinations in the United States,County | Data | Centers for Disease Control and Prevention,” n.d.), Vaccinetracking.us (“Data,” n.d.), CovidActNow (“Data Definitions | Covid Act Now,” n.d.), and the USA Today News Network. For a random sample of counties and a number outliers, we compared these national panels to direct reporting on state and county websites. We concluded that there are sources of measurement error (see Appendix 1), that necessitate aggregating across panels, taking the mean uptake for each county (\(c\)) reported, \(Y_c=mean_c(Vaccinetracking.us_c, CovidActNow_c)\). We use \(USAToday_c\) which tracks only completed vaccination as a check for outliers, and we exclude \(CDC_c\) entirely do to extraordinary missingness and underreporting.

There are at least three main measurement error concerns that we are aware of. The first is that states/counties either fail to report or national panels fail to pull correct counts for a non-random subset of states and counties. This error was most pronounced in the CDC panel and make it inappropriate for this kind of analysis, despite being the most relied upon source in the literature. Second, there is reporting error by state that fails to correctly record home county of the individual, either missing the information entirely or incorrectly attributing it to the county where the vaccine was administered. Of the panels, only CovidActNow explicitly attempts to correct for entirely missing county records due to policy in a handful of states. Together, we find the biggest threat to measurement to come from under-reporting rather than over-reporting. The most likely possible negative consequences of our decision to take the maximum across sources is to attenuate variation between neighboring counties and to replace ad-hoc measurement failure with ad-hoc imputation by CovidActNow for a handful of states. We find both risks greatly preferable to the known problems with the data as is, and we mitigate them in our modeling strategy. The third is county data do not consistently take into account doses administered through federal sources, e.g. Department of Defense, Veterans Affairs, Homeland Security, etc. Here too we attempt to mitigate this with features that measure military and veteran presence as well as in the case of the VA, counts of doses administered by installation in a given county.

3.3 Train, Validation, and Test Strategy

Our goal is to decompose observed vaccine uptake into an unmeasured error component and a measured data generating processes orthogonal to the error component whose contribution comes only through the features. We further want to evaluate many possible data generating processes. These goals are challenging for observational social science data and require specific inferential strategies.

One challenge is that our data are a one-time collection, we cannot inductively form hypotheses and then requisition a new batch of data from the same data generating process (DGP) to evaluate them. We therefore need to partition the data in hand into a training split where we fit models, a validation split where we refine those models inductively, and a final test split where we evaluate our final decisions and ideas. Our confidence in a possible DGP being the true one is in how well it generalizes to new unseen draws from the same process. From an information theoretic perspective, we desire a compact representation that removes idiosyncratic information leaving only information that generalizes to any draws from the DGP. This is in contrast to the typical practice in the social sciences of making strong theoretical assumptions about the DGP and then providing evidence in the form a nonzero weight placed on a feature by a model fit to in sample data. That practice is inappropriate here because we desire to explain variation in vaccine uptake, we don’t have strong prior beliefs on the role of any single feature, a non zero weights assigned by a model is not strong evidence that a particular feature is important (Bzdok, Engemann, and Thirion 2020), repeated testing of the same in sample data quickly diminishes information learned by each of that kind of test (Thompson et al. 2020), considering more than a few features quickly diminishes the intended interpretation of those weights (Achen 2005), and any sufficiently flexible functional form makes simply memorizing a dataset trivial.

A second challenge is that our data are not independent and identically distributed (IID) draws from the same DGP, which reduces the amount of unique information available and makes selection of splits difficult (Roberts et al. 2017). Our county observations are administratively correlated at the state level at a minimum through vaccine distribution strategies and data reporting mechanisms. Our county observations are at a minimum spatially correlated through transportation logistics, flow of vaccine seekers from one county to another, and reporting error assigning some vaccinations to the location it was administered and not the home address of the person. Training on a county and then testing on its immediate neighbor may end up memorizing local geographic patterns rather than the actual role of features.

Our train, validation, and test strategy is based on consensus features within nested cross-validation (Parvandeh et al. 2020). We split our county data along state lines into 5 large geographically contiguous regions shown in Figure 1. We choose regions by hierarchically clustering states based on total human interstate travel between each pair of states in 2019 measured by cell phone locations collected by SafeGraph (Kang et al. 2020).1 An outer loop withholds a region as a test set which is only used for making out of sample predictions, never training. An inner loop fits a model 4 times in 4 folds cross-validation, each iteration training on 3 regions and predicting on 1 validation region. This cross-validation step allows us to choose model hyper-parameters, most important of which is which features to include in the final model.

A caption

A caption

Our feature selection strategy proceeds in three steps. First, our modelling technology is gradient boosted trees (GTB) which is a greedy algorithm that iteratively includes features until a condition is met. This immediately prunes most features that are never selected in any of the folds. Second, we subset to consensus features that are chosen in at least 2 of the 4 CV folds further reducing to just features that found broad geographic support. Our final step is to drop features which do not improve out of sample CV performance. Directly checking the performance of every subset is intractable (\(k!\) many possible subsets), and so we rank order features by suspected importance and test the \(k\) number of cumulative sets. Our measure of importance is the LossSHAP value which is the change in residuals on the hold out validation set when we subtract off the marginal contribution (SHAP value) of each feature (Lundberg et al. 2020). This criteria is most directly relevant to the outcome we care about, performance on unseen out of sample data, and allows for the possibility of negative importance where a feature leads to over-fitting and doesn’t generalize.

We then finalize our feature and hyper parameter selection, fit a single model to the full data of all 4 training folds, and make a single out of sample prediction on the remaining 1 test fold. By doing this full procedure separately for each of the 5 folds, we produce full out of bag predictions for every county in the U.S. The results can be directly interpreted as the amount of out of sample performance in predicting vaccine uptake that can be purchased with a given set of features and fixed regularization budget. How useful a feature is can be interpreted directly in terms of how much unique performance it buys, and its role in the DGP can be interrogated via the functional form chosen across all of the individual models.

3.4 Identification Strategy, Placebo Comparisons, and Feature Evaluation

Our research design is purely observational, we make no claim to exogeneity, nor do we believe that even our large number of features ‘control’ for relevant confounding. Instead, our strategy is an information theoretic one, to propose and rigorously evaluate compact representations that potentially generalize to more draws from the same data generating process. We supplement this with placebo analysis to determine to what degree a representation is uniquely good for our outcome of vaccine uptake relative to other similar but unrelated outcomes (Eggers, Tuñón, and Dafoe 2021). We therefore consider a feature ‘important’ if it meets the following criteria (1) it greatly improves the ability to predict the outcome in out of sample observations (2) that improvement is unique to that feature, and cannot be easily reconstruct through other features that do not share the same theoretical interpretation, and (3) that improvement is unique to our outcome. Features that meet all three criteria suggest further research using either more appropriate individual level data or an experimental design with plausible causal identification. Features that do not may still be part of the true data generating processes, but were not distinguished in the county level data available here.

4 Potential Data Generating Processes of Vaccine Uptake

4.1 Historical Background

COVID-19 vaccinations began in the U.S. on December 14, 2020 (Affairs (ASPA) 2020). The U.S. Food and Drug Administration issued an emergency use authorization (EUA) for persons 16 years or older in December, 2020 which it expanded to 12 and older in May, 2021. The majority of vaccinations given in the U.S. are the two dose sequence by Pfizer, followed by Moderna, and then to a much smaller degree the single dose Johnson and Johnson which was briefly paused in April, 2021. The number of vaccinations given per day increased nearly monotonically before peaking nationally in mid-April at over 3 million doses per day, and then declining to a nadir of about half a million doses per day in July. With the advent of the SARS-CoV-2 Delta variant and a fourth wave of cases in the U.S. vaccination rates are beginning to increase again albeit much more slowly.

4.2 Potential Data Generating Process and Feature Proposals

4.3 Supply Side

Before May 10, 2021 vaccines were allocated in a tiered system allocating lots across states (based on their total adult population) and federal agencies, who in turn chose health departments, hospitals, and retail pharmacies (COVID-19 Vaccine Allocations,” n.d.). After May 10, individual locations could order vaccines directly from the supplier. States issued tiered eligibility schedules that prioritized elderly, healthcare workers, essential workers, etc. States and cities further prioritized demographic and economic groups sometimes at the leve of individual zip codes (Schmidt et al. 2021).

We consider a data generating process for reported vaccinations that is a function of institutional reporting mechanisms, vaccine supply, and vaccine demand. Vaccine demand is defined here as the number of vaccines that would be administered given unconstrained supply. When supply is constrained in any way, or imperfectly measured, then demand is only partially observed to be at least as much as the given supply but possibly much higher. Likewise, vaccine supply is defined here as the number of vaccines that would be administered given unconstrained demand. With perfect demand and measurement, any unused vaccines provide an upper bound on possible demand. Institutional reporting mechanisms are the process by which actual vaccine uptake is mapped into publicly reported records. As demonstrated above, institutional measurement error of health outcomes has systematic nonrandom sources of error. In some cases those systematic components may be an even larger part of the data generating process than the underlying empirical data generating process we actually care about.

Age elgibility leads to major differences (Pathak, Menard, and Garcia 2021)

idiosyncratic state policies religious exemption https://www.tennessean.com/story/news/politics/2021/03/31/covid-19-vaccination-religious-exemptions-during-public-health-crisis-advance-senate/4825721001/ Parental exemption for teenagers https://www.tennessean.com/story/news/health/2021/07/12/tennessee-fires-top-vaccine-official-covid-19-shows-new-spread/7928699002/

VaxMap 2.0 Number of facilities and driving distance to a facility (n.d.)

4.4 Demmand Side

Vaccinations per day increased week on week, peaking to average of 3 million per day in early April, and now tapering to about half a million per day.

The most proximate measure of demand available is survey self reported desire to receive a vaccine. The COVID-19 Trends and Impact Survey (CTIS) run by the Delphi group at Carnegie Mellon in partnership with Facebook

(Barkay et al. 2020) We propose to measure demand most directly with survey questions of self reported desire for a vaccination.

Predictors of willingness to get a COVID-19 vaccine in the U.S https://bmcinfectdis.biomedcentral.com/articles/10.1186/s12879-021-06023-9 (Kelly et al. 2021)

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7993557/ (Hughes et al. 2021)

social vulnerability (Hughes et al. 2021)

https://aspe.hhs.gov/pdf-report/vaccine-hesitancy

The most proximate measure of demand available is survey self reported desire to receive a vaccine. COVID-19 Trends and Impact Survey (CTIS) run by the Delphi group at Carnegie Mellon (Barkay et al. 2020) We propose to measure demand most directly with survey questions of self reported desire for a vaccination.

typically referred to as vaccine hesitancy,

(Truong et al. 2021)

(Pitts and Freeman 2021)

https://www.va.gov/vetdata/veteran_population.asp Vertans per county https://www.va.gov/vetdata/veteran_population.asp

(Brown, Young, and Pro 2021)

Large scale cross-national polling of vaccine confidence (Figueiredo et al. 2020).

Same features also predict deaths (Ruck, Bentley, and Borycz 2021)

features continue to be important at the subcounty neighborhood level (Lei 2021)

Download Options Covid-19 Vaccination Provider Locations in the United States

5 Results

We start broadly, considering the role of large non-mutually exclusive categories of features that share an intended substantive concept. We provide three views on the importance of each category of features. In each case, we evaluate features in terms of their impact on out of sample performance, reported here as RMSE and annotated with percent reduction in error from a null intercept only baseline, which has a RMSE of 11.3% and a MEA of 9%. For example, our best performing model/set of features has a RMSE of 6.74 which is a -40.5% reduction in predictive error from the baseline.

Our first measure of feature importance is predictive power of a model with access to only features from that category. Here, self reported attitudes towards global warming and solutions towards global warming provide the most direct information (RMSE 8.11, -28.4%). Alone, categories of features vary meaningfully in how much information they provide from a great deal, voting and masking behavior, to hardly any family structure vaccine supply. Our second measure of importance is how much unique information a category of features provides, measured as the change in predictive power of a model with access to every feature but those from that category. Here, model performance degrades the most when voting behavior is withheld (RMSE 7.34, -35.2%). Unlike before, variation in the importance across features is much weaker with many features providing redundant information that can be used to reconstruct the others, a difference in only -4.4% reduction in error from best to worst performance. While global warming beliefs held the most direct information, witholding them and keep all others barely impacts the model, as it can be reconstructed from other features like education and voting behavior. Lastly, we rank order categories of features from most unique to least unique information and cumulatively remove least unique (redundant) features. We find that not only does model performance not degrade, it improves, peaking when we exclude 6 categories of features, marriage, foreign born, covid, global warming, health conditions, and citizenship. Removing access to these features improves model fit, reduces over-fitting, and removing any more than these starts to degrade model performance.

County Vaccine Uptake Predictive Performance by Inclusion/Exclusion of Groups of Features
FeaturesOnly this CategoryAll but this CategoryCumulatively Remove Categories
CategoryCountRMSEΔ BaselineRMSEΔ BaselineRMSEΔ Baseline
voting68.6 -24.1%7.34-35.2%      
housing29558.8 -22.3%7.3 -35.5%8.6 -24.1%
masking59.33-17.6%7.21-36.3%7.88-30.4%
race80388.51-24.9%7.2 -36.4%7.59-33.0%
healthcare_services49.73-14.1%7.18-36.6%7.26-35.9%
disabilities561      7.18-36.6%6.98-38.4%
economy_employment1388.99-20.6%7.15-36.9%7.07-37.6%
transportation1803      7.15-36.9%7.23-36.2%
population762110.8 -4.2%7.13-37.0%7.35-35.1%
family_structure214311.3 -0.1%7.13-37.0%7.07-37.6%
mobility_safegraph69.78-13.6%7.11-37.2%7.16-36.8%
age805210.9 -3.6%7.11-37.3%7.3 -35.5%
insurance86010.4 -7.8%7.09-37.4%6.98-38.4%
amenities404      7.07-37.5%6.96-38.6%
religion1438.91-21.4%7.07-37.6%6.92-38.9%
sex9968      7.06-37.7%6.94-38.7%
education12368.41-25.8%7.04-37.8%6.98-38.4%
households50908.56-24.5%7.03-38.0%6.82-39.8%
google_mobility39.49-16.2%7.02-38.0%6.85-39.5%
vaccine_supply711.1 -1.8%7.01-38.1%6.79-40.1%
military144710.7 -5.8%7   -38.2%6.9 -39.1%
longer_term_mobility761      7   -38.2%6.86-39.5%
occupation44089.14-19.3%7   -38.2%6.77-40.2%
vaccine_demand29.4 -17.0%6.97-38.5%6.82-39.8%
wealth_employment68748.94-21.1%6.97-38.5%6.81-39.8%
marriage1503      6.97-38.5%6.74-40.5%
foreign_born131610.7 -5.5%6.95-38.7%6.78-40.2%
covid410.4 -8.0%6.93-38.8%6.82-39.8%
global_warming608.11-28.4%6.89-39.2%6.77-40.2%
health_conditions510.3 -8.7%6.87-39.3%6.99-38.3%
citizenship5919.9 -12.6%6.84-39.6%7.16-36.8%
Shap
FeatureFoldsSum%Cum. %

Marginal Fx.

politics Trump Vote 20205103    23.4%23.4%

politics Trump Vote 2016442.7  9.7%33.1%

acs race total black or african american alone b1 003 acs5318.5  4.2%37.3%

nyt masking masking always315.3  3.5%40.8%

acs hispanic or latino origin by race total not hispanic or latino black or african american alone b2 004 acs5413.4  3.0%43.8%

bae gdp in real estate and rental and leasing ita411.6  2.6%46.5%

countyhealthrankings mammography screening211.5  2.6%49.1%

bae gdp in arts, entertainment, recreation, accommodation, and food services ita310    2.3%51.4%

acs tenure by occupants per room total owner occupied 0 50 or less occupants per room b4 003 acs539.57 2.2%53.6%

nyt masking masking never39    2.0%55.6%

acs people reporting multiple ancestry total lithuanian b5 053 acs538.4  1.9%57.5%

acs people reporting ancestry total eastern european b6 035 acs548.34 1.9%59.4%

acs monthly housing costs total 2 000 to 2 499 b4 014 acs538.23 1.9%61.3%

acs people reporting multiple ancestry total italian b5 051 acs527.61 1.7%63.0%

nyt masking masking rarely27.23 1.6%64.7%

politics johnmccain16.21 1.4%66.1%

acs tenure by house heating fuel total owner occupied fuel oil kerosene etc b7 006 acs525.94 1.4%67.4%

politics mittromney25.55 1.3%68.7%

acs value total 500 000 to 749 999 b5 023 acs525.39 1.2%69.9%

cew employed NAICS 92 Public administration Local Government ita35.36 1.2%71.1%

acs monthly housing costs total 3 000 or more b4 016 acs514.47 1.0%72.2%

acs aggregate price asked dollars aggregate price asked dollars b6 001 acs5 ita24.43 1.0%73.2%

acs people reporting ancestry total american b6 005 acs524.3  1.0%74.2%

bae gdp in professional and business services ita34.27 1.0%75.1%

acs people reporting multiple ancestry total polish b5 061 acs524.2  1.0%76.1%

politics georgewbush34.16 0.9%77.0%

acs plumbing facilities for occupied housing units total complete plumbing facilities b8 002 acs5 ita13.99 0.9%77.9%

acs plumbing facilities for all housing units total complete plumbing facilities b7 002 acs5 ita23.77 0.9%78.8%

acs plumbing facilities for occupied housing units total b8 001 acs5 ita23.76 0.9%79.6%

bae gdp in private services-providing industries 3/ ita23.72 0.8%80.5%

acs monthly housing costs total 300 to 399 b4 005 acs533.68 0.8%81.3%

acs tenure by plumbing facilities total owner occupied complete plumbing facilities b9 003 acs5 ita23.34 0.8%82.1%

acs house heating fuel total wood b0 007 acs522.87 0.7%82.7%

acs american indian and alaska native aian alone or in any combination by selected tribal groupings total groups tallied b7 001 acs512.78 0.6%83.4%

acs monthly housing costs total 2 500 to 2 999 b4 015 acs522.76 0.6%84.0%

acs people reporting multiple ancestry total austrian b5 019 acs522.66 0.6%84.6%

acs tenure by selected physical and financial conditions total renter occupied with one selected condition b3 009 acs522.61 0.6%85.2%

acs american indian and alaska native alone for selected tribal groupings total b4 001 acs522.58 0.6%85.8%

acs hispanic or latino origin by specific origin total hispanic or latino other hispanic or latino b1 027 acs522.53 0.6%86.4%

bae gdp in health care and social assistance ita22.53 0.6%86.9%

acs hispanic or latino origin by race total not hispanic or latino two or more races two races excluding some other race and three or more races b2 011 acs522.5  0.6%87.5%

bae gdp in arts, entertainment, and recreation ita22.42 0.6%88.1%

cew employed NAICS 31-33 Manufacturing Private22.42 0.5%88.6%

acs value total 1 000 000 to 1 499 999 b5 025 acs522.33 0.5%89.1%

bae gdp in professional and business services12.13 0.5%89.6%

acs monthly housing costs total 400 to 499 b4 006 acs512.06 0.5%90.1%

acs value total 70 000 to 79 999 b5 012 acs511.93 0.4%90.5%

acs contract rent total with cash rent 350 to 399 b6 009 acs521.92 0.4%91.0%

cew change employed NAICS 54 Professional and technical services Private21.69 0.4%91.4%

bae gdp in information ita21.63 0.4%91.7%

cew employed NAICS 44-45 Retail trade Private ita21.61 0.4%92.1%

politics georgewbush21.57 0.4%92.5%

acs people reporting single ancestry total french except basque b4 040 acs521.48 0.3%92.8%

acs people reporting single ancestry total finnish b4 039 acs521.19 0.3%93.1%

acs american indian and alaska native alone for selected tribal groupings total american indian tribes specified b4 002 acs511.08 0.2%93.3%

acs tenure by house heating fuel total renter occupied bottled tank or lp gas b7 014 acs521.04 0.2%93.5%

acs value total less than 10 000 b5 002 acs510.9740.2%93.8%

cew change employed NAICS 44-45 Retail trade Private20.9680.2%94.0%

acs value total 30 000 to 34 999 b5 007 acs520.9350.2%94.2%

acs aggregate gross rent dollars by meals included in rent aggregate gross rent meals included in rent b7 002 acs5 ita20.8960.2%94.4%

acs aggregate rent asked dollars aggregate rent asked b2 001 acs5 ita30.88 0.2%94.6%

acs people reporting multiple ancestry total french canadian b5 041 acs530.8770.2%94.8%

acs plumbing facilities for all housing units total b7 001 acs5 ita20.83 0.2%95.0%

acs tenure by house heating fuel total renter occupied wood b7 018 acs510.8210.2%95.2%

acs value total 40 000 to 49 999 b5 009 acs510.7970.2%95.4%

acs tenure by plumbing facilities by occupants per room total owner occupied complete plumbing facilities 1 01 to 1 50 occupants per room b6 005 acs5 ita20.78 0.2%95.5%

cew employed NAICS 92 Public administration State Government ita20.7150.2%95.7%

acs people reporting ancestry total belgian b6 021 acs510.6910.2%95.9%

acs gross rent total with cash rent 750 to 799 b3 017 acs510.6890.2%96.0%

cew employed NAICS 99 Unclassified Private ita20.68 0.2%96.2%

acs hispanic or latino origin by race total not hispanic or latino white alone b2 003 acs510.6770.2%96.3%

bae gdp in educational services ita10.6450.1%96.5%

acs housing unit response and nonresponse rates with reasons for noninterviews nonresponse rate refusal b1 003 acs520.6210.1%96.6%

cew employed NAICS 92 Public administration Federal Government20.62 0.1%96.8%

acs tenure by occupants per room total owner occupied 1 51 to 2 00 occupants per room b4 006 acs520.5980.1%96.9%

acs value total 10 000 to 14 999 b5 003 acs520.5840.1%97.0%

acs people reporting ancestry total scotch irish b6 066 acs520.5750.1%97.2%

bae gdp in retail trade ita10.5710.1%97.3%

acs plumbing facilities for occupied housing units total lacking complete plumbing facilities b8 003 acs5 ita10.5570.1%97.4%

acs people reporting single ancestry total american b4 005 acs510.5360.1%97.5%

bae gdp in mining, quarrying, and oil and gas extraction20.52 0.1%97.7%

bae gdp in other services ita20.5110.1%97.8%

References

n.d. https://s8637.pcdn.co/wp-content/uploads/2021/02/Access-to-Potential-COVID-19-Vaccine-Administration-Facilities-2-2-2021.pdf.
Achen, Christopher H. 2005. “Let’s Put Garbage-Can Regressions and Garbage-Can Probits Where They Belong.” Conflict Management and Peace Science 22 (4): 327–39. https://doi.org/10.1080/07388940500339167.
Affairs (ASPA), Assistant Secretary for Public. 2020. COVID-19 Vaccine Distribution: The Process.” Text. HHS.gov. https://www.hhs.gov/coronavirus/covid-19-vaccines/distribution/index.html.
Barkay, Neta, Curtiss Cobb, Roee Eilat, Tal Galili, Daniel Haimovich, Sarah LaRocca, Katherine Morris, and Tal Sarig. 2020. “Weights and Methodology Brief for the COVID-19 Symptom Survey by University of Maryland and Carnegie Mellon University, in Partnership with Facebook.” arXiv:2009.14675 [Cs], October. https://arxiv.org/abs/2009.14675.
Bartsch, Sarah M, Patrick T Wedlock, Kelly J O’Shea, Sarah N Cox, Ulrich Strych, Jennifer B Nuzzo, Marie C Ferguson, et al. 2021. “Lives and Costs Saved by Expanding and Expediting COVID-19 Vaccination.” The Journal of Infectious Diseases, no. jiab233 (May). https://doi.org/10.1093/infdis/jiab233.
Brown, Clare C., Sean G. Young, and George C. Pro. 2021. COVID-19 Vaccination Rates Vary by Community Vulnerability: A County-Level Analysis.” Vaccine 39 (31): 4245–49. https://doi.org/10.1016/j.vaccine.2021.06.038.
Bzdok, Danilo, Denis Engemann, and Bertrand Thirion. 2020. “Inference and Prediction Diverge in Biomedicine.” Patterns 1 (8). https://doi.org/10.1016/j.patter.2020.100119.
CDC. 2020a. “Coronavirus Disease 2019 (COVID-19).” Centers for Disease Control and Prevention. https://www.cdc.gov/coronavirus/2019-ncov/science/forecasting/forecasting-us.html.
———. 2020b. COVID Data Tracker.” Centers for Disease Control and Prevention. https://covid.cdc.gov/covid-data-tracker.
Chernyavskiy, Pavel, Jeanita W. Richardson, and Sarah J. Ratcliffe. 2021. COVID-19 Vaccine Uptake in United States Counties: Geospatial Vaccination Patterns and Trajectories Towards Herd Immunity.” medRxiv, May, 2021.05.28.21257946. https://doi.org/10.1101/2021.05.28.21257946.
COVID-19 Vaccinations in the United States,County | Data | Centers for Disease Control and Prevention.” n.d. https://data.cdc.gov/Vaccinations/COVID-19-Vaccinations-in-the-United-States-County/8xkx-amqh.
COVID-19 Vaccine Allocations.” n.d. Texas Department of State Health Services. https://www.dshs.state.tx.us/coronavirus/immunize/vaccineallocations.aspx.
“Data.” n.d. http://www.vaccinetracking.us/data.html.
“Data Definitions | Covid Act Now.” n.d. https://apidocs.covidactnow.org/data-definitions/.
Eggers, Andrew C., Guadalupe Tuñón, and Allan Dafoe. 2021. “Placebo Tests for Causal Inference.” Working paper. URL: link.
Figueiredo, Alexandre de, Clarissa Simas, Emilie Karafillakis, Pauline Paterson, and Heidi J. Larson. 2020. “Mapping Global Trends in Vaccine Confidence and Investigating Barriers to Vaccine Uptake: A Large-Scale Retrospective Temporal Modelling Study.” The Lancet 396 (10255): 898–908. https://doi.org/10.1016/S0140-6736(20)31558-0.
Hughes, Michelle M., Alice Wang, Marissa K. Grossman, Eugene Pun, Ari Whiteman, Li Deng, Elaine Hallisey, et al. 2021. “County-Level COVID-19 Vaccination Coverage and Social Vulnerability United States, December 14, 2020 1, 2021.” Morbidity and Mortality Weekly Report 70 (12): 431–36. https://doi.org/10.15585/mmwr.mm7012e1.
Kang, Yuhao, Song Gao, Yunlei Liang, Mingxiao Li, Jinmeng Rao, and Jake Kruse. 2020. “Multiscale Dynamic Human Mobility Flow Dataset in the U.S. During the COVID-19 Epidemic.” Scientific Data 7 (1): 390. https://doi.org/10.1038/s41597-020-00734-5.
Kelly, Bridget J., Brian G. Southwell, Lauren A. McCormack, Carla M. Bann, Pia D. M. MacDonald, Alicia M. Frasier, Christine A. Bevc, Noel T. Brewer, and Linda B. Squiers. 2021. “Predictors of Willingness to Get a COVID-19 Vaccine in the U.S.” BMC Infectious Diseases 21 (1): 338. https://doi.org/10.1186/s12879-021-06023-9.
Lei, Yuxiao. 2021. “Hyper Focusing Local Geospatial Data to Improve COVID-19 Vaccine Equity and Distribution.” Journal of Urban Health, June. https://doi.org/10.1007/s11524-021-00552-z.
Lundberg, Scott M., Gabriel Erion, Hugh Chen, Alex DeGrave, Jordan M. Prutkin, Bala Nair, Ronit Katz, Jonathan Himmelfarb, Nisha Bansal, and Su-In Lee. 2020. “From Local Explanations to Global Understanding with Explainable AI for Trees.” Nature Machine Intelligence 2 (1): 56–67. https://doi.org/10.1038/s42256-019-0138-9.
Mishra, Anubhuti, Staci Sutermaster, Peter Smittenaar, Nicholas Stewart, and Sema K. Sgaier. 2021. COVID-19 Vaccine Coverage Index: Identifying Barriers to COVID-19 Vaccine Uptake Across U.S. Counties.” medRxiv, June, 2021.06.17.21259116. https://doi.org/10.1101/2021.06.17.21259116.
Parvandeh, Saeid, Hung-Wen Yeh, Martin P Paulus, and Brett A McKinney. 2020. “Consensus Features Nested Cross-Validation.” Bioinformatics 36 (10): 3093–98. https://doi.org/10.1093/bioinformatics/btaa046.
Pathak, Elizabeth B., Janelle Menard, and Rebecca Garcia. 2021. “Population Age-Ineligible for COVID-19 Vaccine in the United States: Implications for State, County, and Race/Ethnicity Vaccination Targets.” medRxiv, February, 2021.02.11.21251562. https://doi.org/10.1101/2021.02.11.21251562.
Pitts, Peter J., and Emily Freeman. 2021. “Health Literacy: The Common Denominator of Healthcare Progress.” The Patient - Patient-Centered Outcomes Research, July. https://doi.org/10.1007/s40271-021-00537-9.
Roberts, David R., Volker Bahn, Simone Ciuti, Mark S. Boyce, Jane Elith, Gurutzeta Guillera-Arroita, Severin Hauenstein, et al. 2017. “Cross-Validation Strategies for Data with Temporal, Spatial, Hierarchical, or Phylogenetic Structure.” Ecography 40 (8): 913–29. https://doi.org/10.1111/ecog.02881.
Ruck, Damian J., R. Alexander Bentley, and Joshua Borycz. 2021. “Early Warning of Vulnerable Counties in a Pandemic Using Socio-Economic Variables.” Economics & Human Biology 41 (May): 100988. https://doi.org/10.1016/j.ehb.2021.100988.
Schmidt, Harald, Rebecca Weintraub, Michelle A. Williams, Kate Miller, Alison Buttenheim, Emily Sadecki, Helen Wu, et al. 2021. “Equitable Allocation of COVID-19 Vaccines in the United States.” Nature Medicine, May, 1–10. https://doi.org/10.1038/s41591-021-01379-6.
Stewart, Nicholas, Peter Smittenaar, Staci Sutermaster, Lindsay Coome, and Sema Sgaier. 2021. “Inequities Among Vulnerable Communities During the COVID-19 Vaccine Rollout.” medRxiv, June, 2021.06.15.21258978. https://doi.org/10.1101/2021.06.15.21258978.
Thompson, William Hedley, Jessey Wright, Patrick G Bissett, and Russell A Poldrack. 2020. “Dataset Decay and the Problem of Sequential Analyses on Open Datasets.” Edited by Peter Rodgers, Chris I Baker, Nick Holmes, Chris I Baker, and Guillaume A Rousselet. eLife 9 (May): e53498. https://doi.org/10.7554/eLife.53498.
Truong, Judy, Simran Bakshi, Aghna Wasim, Mobeen Ahmad, and Umair Majid. 2021. “What Factors Promote Vaccine Hesitancy or Acceptance During Pandemics? A Systematic Review and Thematic Analysis.” Health Promotion International, no. daab105 (July). https://doi.org/10.1093/heapro/daab105.

  1. We hierarchically cluster interstate trips using Ward’s distance, and cut the dendrogram at 6 clusters. New England presents as a unique cluster but has only 45 counties and so we collapse it the North East folds resulting in 5 total.↩︎